1 Abstract

This reports analyzes the information presented in the data set Sales Videogames and its main objective is to show the process of data cleaning and present the main insights and recommendations for the industry to thrive in the years to come.

2 Problem Setup

2.1 Introduction

2.1.1 Overview

Other than fun for children, youth and adults, video games have become one of the most profitable and commercially attractive industries of this decade. In 2017 alone, the global video game market exceeded $ 100 billion, a figure that marked a milestone in the history of this segment, registering a growth of more than 50% in five years. The market is led by video games for mobile devices (42%), followed by video games for consoles (31%) and for computers (27%), which include physical, downloadable and browser video games.

Currently, video game companies around the world, in addition to attracting and entertaining the consumer, direct their strategies towards the development of comprehensive entertainment systems so that, in collaboration with other industries, they can monetize the interest of the traditional public and even the new consumer segments. In other words, the profits are generated not only by the purchase of the console and the video game, but also include mechanisms such as subscriptions fees of entertainment networks; selling tickets for special events, conventions or tournaments; complementary physical accessories; and integrated shopping and advertising, to name just a few.

Recently, a student from Tecnologico de Monterrey was hired to clean, transform, merge, process, and analyze a dataset which allows them to create a platform to visualize which video games have been the best-selling in the last few years as well as other interesting exploratory and descriptive analysis. This tool will be offered to developer companies in order to identify user preferences and thus, be able to make decisions that open the opportunities for them building greater profitability for their business.

To develop this dataset, the student collected information that includes a video game dataset from 1977 to the end of the second decade of the 21st century. The variables included are: name of the videogame, the genre of the video game, the company that published it (publisher), the console where it is played, as well as the sales generated in different regions of the world by the video game on its launch date (in millions of units), a measure of consumer rating to the videogame, and whether the game won an award.

However, the content of the dataset is not cleaned and the information contains errors that could cause inappropriate decision-making for the end user of the platform. Therefore, the analyst must first clean, transform, merge, and pre-process the dataset in order to be able to analyze it then.

2.1.2 Research Process

To do the pertinent cleaning of the dataset we used RStudio, using a Markdown file set to HTML. The analysis and ploting was also done in the same R environment.

2.2 Objectives

This project is divided into 2 big objectives.

  • The first one is to clean the dataset presented so it becomes useful and we can preform a proper analysis.
  • The second one is to use different statistical measures and graphic representations to perfom an analysis that descrives the overview of the video game sales behavior in the last 40 years. As pasr of this objective the following questionings must be solved:
    • Average sales of video games in the last decades in North America, Europe, Japan and worldwide
    • Year in which global video game sales were at its highest
    • Most successful genre of all time and for each region
    • Game console that has released the majority of video games
    • Genre with the most video games
    • Publisher who has distributed the majority of video games
    • Most successful publisher by region and worldwide

3 Data

3.1 Data description

We used the dataset SalesVideogames to perform all the analysis. It contains 22 columns that can be summarized in information about the videogame and sales by region. The variables are:

  • Information about the videogame:
    • X: (Ordinal) This variable is just a ID for each name.
    • Rank: (Ordinal) This variable Ranks each videogame.
    • Name: (Charachter) The name of the videogame.
    • Platform: (Charachter) The game console that the game was published for.
    • Publisher: (Charachter) The company that published each videogame.
    • Action: (Numeric) When a videogame is from the category Action, it shows the year it was published.
    • Adventure: (Numeric) When a videogame is from the category Adventure, it shows the year it was published.
    • Fighting: (Numeric) When a videogame is from the category Fighting, it shows the year it was published.
    • Misc: (Numeric) When a videogame is from the category Misc, it shows the year it was published.
    • Platform.1: (Numeric) When a videogame is from the category Ploatform, it shows the year it was published.
    • Puzzle: (Numeric) When a videogame is from the category Puzzle, it shows the year it was published.
    • Racing: (Numeric) When a videogame is from the category Racing, it shows the year it was published.
    • RolePlaying: (Numeric) When a videogame is from the category Role Playing, it shows the year it was published.
    • Shooter: (Numeric) When a videogame is from the category Shooters, it shows the year it was published.
    • Simulation: (Numeric) When a videogame is from the category Simulation, it shows the year it was published.
    • Sports: (Numeric) When a videogame is from the category Sports, it shows the year it was published.
    • Strategy: (Numeric) When a videogame is from the category Strategy, it shows the year it was published.
  • Sales:
    • NA_Sales: (Numeric) This variable shows the value of sales for each videogame in the region of North America.
    • EU_Sales: (Numeric) This variable shows the value of sales for each videogame in the region of Europe.
    • JP_Sales: (Numeric) This variable shows the value of sales for each videogame in the region of Japan.
    • Other_Sales: (Numeric) This variable shows the value of sales for each videogame in Other regions.
    • Global_Sales: (Numeric) This variable shows the value of sales for each videogame Global.

3.2 Data cleaning conducted

As mentioned before, the dataset that was used for this job was untidy and had various errors that made it inappropriate for its use. So I could be able to get real insight, and as one of the objectives, a data cleaning process was done.

3.2.1 Gathering results

The first step done was to join all the categories columns (from Action to Strategy) into 1 that showed them all and created a column with the value fo the years. This action was performed with the function gather(). In the same me step, with the use of %>% (pipes), I replaced all the Platform.1 for Platform. This is because as it was previously a column and so there were no repeated names this has a name change, but as it becomes a value of a variable we can perform the change. Here I was also able to remove all the missing values from the variable year, leaving only the actual videogames released. Once created the new columns, I made sure that category was a factor and year an integer with the functions as.factor and as.integer respectively.

3.2.1.1 Before

Table 1.1 Structure of the Original Dataset
X Rank Name Platform Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Action Adventure Fighting Misc Platform.1 Puzzle Racing RolePlaying Shooter Simulation Sports Strategy
1 1 Wii Sports Wii Nintendo 41.49 29.02 3.77 8.46 82.74 NA NA NA NA NA NA NA NA NA NA 2006 NA
2 2 Super Mario Bros.  NES Nintendo 29.08 3.58 6.81 0.77 40.24 NA NA NA NA 1985 NA NA NA NA NA NA NA
3 3 Mario Kart Wii Wii Nintendo 15.85 12.88 3.79 3.31 35.82 NA NA NA NA NA NA 2008 NA NA NA NA NA
4 4 Wii Sports Resort Wii Nintendo 15.75 11.01 3.28 2.96 33.00 NA NA NA NA NA NA NA NA NA NA 2009 NA
5 5 Pokemon Red/Pokemon Blue GB Nintendo 11.27 8.89 10.22 1.00 31.37 NA NA NA NA NA NA NA 1996 NA NA NA NA
6 6 Tetris GB Nintendo 23.20 2.26 4.22 0.58 30.26 NA NA NA NA NA 1989 NA NA NA NA NA NA

3.2.1.2 After

Table 1.2 Structure of the New Dataset
X Rank Name Platform Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales category year
17 17 Grand Theft Auto V PS3 Take-Two Interactive 7.01 9.27 0.97 4.14 21.40 Action 2103
18 18 Grand Theft Auto: San Andreas PS2 Take-Two Interactive 9.43 0.40 0.41 10.57 20.81 Action 2004
24 24 Grand Theft Auto V X360 Take-Two Interactive 9.63 5.31 0.06 1.38 16.38 Action 2013
25 25 Grand Theft Auto: Vice City PS2 Take-Two Interactive 8.41 5.49 0.47 1.78 16.15 Action 2002
39 39 Grand Theft Auto III PS2 Take-Two Interactive 6.99 4.51 0.30 1.30 13.10 Action 2001
45 45 Grand Theft Auto V Online PS4 Take-Two Interactive 3.80 5.81 0.36 2.02 11.98 Action 2014

3.2.2 NA Removal

Following the data cleaning, I had to check the remaining numeric variables, that where all the sales. The first part here wass to find the missing values using the function is.na with all 5 variables. In this particular case, only the regions had 4 missing values each. to replace them, I used the mean for each one and change all of them with an ifelse() statement.

3.2.3 Removing outliers

The variable yearhad some outliers cause by a fail in the capturing. To find them first I made a point graph that clearly showed a pattern and those points that were not even close. Also a table showing the year and the name was created to identify the specific outliers and to be able to investigate the actual year that videogame was released. With these identified, using the str_replace_all() function, I was able to replace all those outliers. Finally, the same plot was made, but now it had no more outliers

3.2.3.1 With Outliers

Plot 1.1 Year Plot with Outliers

Plot 1.1 Year Plot with Outliers

3.2.3.2 Without Outliers

Plot 1.1 Year Plot with Outliers

Plot 1.1 Year Plot with Outliers

3.2.4 Cleaning characters

Next, I decided to use the functions str_trim() and str_to_lower() to solve som minor inconsistencies on our categorical variables. In this part I also replaced all xbox to xb in the platform column because they were refering to the same console.

3.2.5 Creating Datasets

Finally, I created 2 new datasets using gather() to group all regions into one variable. Both datasets are similar and have only one difference, the first one gathers only the regions but the second one also considered Global_Sales. This step was done to be more useful for the analitics process to perform.

3.2.5.1 Sales2

Table 2.1 Structure of the New Dataset
X Rank Name Platform Publisher Global_Sales category year Region Sales
17 17 grand theft auto v ps3 take-two interactive 21.40 Action 2013 NA_Sales 7.01
18 18 grand theft auto: san andreas ps2 take-two interactive 20.81 Action 2004 NA_Sales 9.43
24 24 grand theft auto v x360 take-two interactive 16.38 Action 2013 NA_Sales 9.63
25 25 grand theft auto: vice city ps2 take-two interactive 16.15 Action 2002 NA_Sales 8.41
39 39 grand theft auto iii ps2 take-two interactive 13.10 Action 2001 NA_Sales 6.99
45 45 grand theft auto v online ps4 take-two interactive 11.98 Action 2014 NA_Sales 3.80

3.2.5.2 Sales3

Table 2.2 Structure of the New Dataset
X Rank Name Platform Publisher category year Region Sales
17 17 grand theft auto v ps3 take-two interactive Action 2013 NA_Sales 7.01
18 18 grand theft auto: san andreas ps2 take-two interactive Action 2004 NA_Sales 9.43
24 24 grand theft auto v x360 take-two interactive Action 2013 NA_Sales 9.63
25 25 grand theft auto: vice city ps2 take-two interactive Action 2002 NA_Sales 8.41
39 39 grand theft auto iii ps2 take-two interactive Action 2001 NA_Sales 6.99
45 45 grand theft auto v online ps4 take-two interactive Action 2014 NA_Sales 3.80

4 Analysis and Insights

As part of our second objective, an analysis must be done to be able to answer the initial questions.

4.1 Statistical/Analytical techniques employed

All the analysis was performed on R, being able to get some statistical values such as the mean tosolve somo of the problems we had. To explain in a easier way the data, some tables and plots were used.

4.2 Results

In this section you will find the solving of the quiestioning proposed ans some insights found through them.

4.2.1 Average sales of video games in the last decades in North America, Europe, Japan and worldwide

Region avg_sales
Global_Sales 0.5374407
NA_Sales 0.2643281
EU_Sales 0.1463619
JP_Sales 0.0777769
Other_Sales 0.0480270

The first behavior we are able to observe here is that the average sales of videogames was at is peak before the 90’s. This is telling us that the most succesful decade, taking into account only the average of the sales, was the 80’s. Here we can also see that for most of the years, North America has been the most succesful region, being surpased clearly only by Japan. Basically, until 1996 they were the top contenders on this aspect, but as the average has decreased to our days, the top regions are North america and Europe. Now, talking about the all time mean sales, both North America and Europe, represent close to a 77% of the Global sales.

4.2.2 Year in which global video game sales were at its highest

Year Global
2008 690.56
2009 670.21
2007 621.08
2010 606.30
2011 528.68


With this graph and table we can see in a very clear way, how videogames sales behave through the years. According to the data, the year when the global sales where at its highest was 2008 but 2009 got really close. The greater sales were done from 2005 to 2011.
In general we can se that from the first years until 2008, the total Global sales follows a growing pattern that get to its peak and starts plummeting until 2017.
Another interesting finding in here was that the amount of sales follows a peculiar growth pattern. There is a curve formed by between 3 to 6 years in which there are ups and downs. It grows for a few tears, reach a peak and goes down again. This cycle repeats through out the hole graph, but we never get to see the behavior after the last decrease in sales in 2017.

4.2.3 Most successful genre of all time and for each region

4.2.3.1 By Year

The first insight found is that before the 90´s, the best selling videogames were fron the categories Platform and Puzzle. For the next decades, we can observe that in general the sales of all the regions have a similar behavior, with the exeption of Japan. For all The top categories from 1990 to 2017 were Action, Sports, Misc and Shooters in that order, but for japan it is more like Role Playing, Misc, and Action. Now, when we compare that to the global sales by category, we can see that the global pattern is the same as the one from the 3 cattegories that are not Japan.

4.2.3.2 All time

Region category SalesxRegion
NA_Sales Action 877.8300
EU_Sales Action 525.0000
JP_Sales RolePlaying 352.3100
Other_Sales Action 186.9261

Visually, the first thing that shows up is that the region of North America represents close to the half of the total sales. This graph confirms the asumpyions made by analyzing the most successful categories by year. For North America, Europe and Other regions the top selling genre was Action and the runner-up was Sports. In the case of Japan Role Playing got the first place followed by Action.
A very interesting detail is that for North America, Europe and Other shooters videogames were in the Top 3, but for Japan, Shooters are the least succesfull videogames category.

4.2.4 Game console that has released the majority of video games

The console that has released more videogames is the Nintendo DS with a total of 2163

The PlayStation2 released 2161 videogames being the second place only 2 releases behind


4.2.5 Genre with the most video games

The category that releases the most videogames is Action, followed by Sports and in 3rd place is Misc.

4.2.6 Publisher who has distributed the majority of video games

Top 1

Top 2

Top 3

Top 4

Top 5


4.2.7 Most successful publisher by region and worldwide

Region Publisher SalesxRegion
Global_Sales nintendo 1786.9100
NA_Sales nintendo 816.8700
JP_Sales nintendo 455.6878
EU_Sales nintendo 418.7400
Other_Sales electronic arts 129.7700

We can appreciate that for Global sales the most successful publisher is Nintendo, and it remains like this for all the regions except from others were EA wins. This surprises because Nintendo is not the company that released the more videogames, but is the number 7.

4.2.8 Total Sales vs Total Videogames published

The main insight that we get from this visualization is that succes may not only be measured By the amount of videogames published or the total sales. There are some companies that released even only 1 game, but when getting the relation it holds against the sales it got, they got to the top. The surprise here is that Electronic Arts, which is the company that releases the most videogames, is not even in the top 10. In the case of nintendo that have the most sales, it apears in the top bot it is not the best one. In general the best is palcom.

4.3 Conclusions and Business Recommendations

The data studied gave us some really important insights.

  • The fist thing to take into consideration is that Nintendo have a great strategy, because the win by having a lot of consoles and they release multiple games for them. This helps them to be in the top spot of sales in nearly every region. More companies can adopt this strategy and diversify their incomes. This can also create more loyal costumer throughout the years, without the need to be the ones that releases the most games.
  • The success of a genre change by region
    • North America, Europe, and Other have a similar behavior
      • The way to get a best seller is with Action, Sports or Shooters
      • Here the region with the most competitors is North Amética, but also is the one where more money is moved.
    • For Japan the main genre is Role Play and Action or Sports ar still good options, but Shooters are a no go

5 End Matter

5.1 Appendices

5.1.1 Previous Code

5.1.1.1 Libraries

library(dplyr) #for data manipulation
library(ggplot2) #for graphs and plots
library(treemapify) # a complement to ggplot
library(stringr) #string characters
library(tidyr) #manipulate datasets
library(visdat) #for visualizing missing data
library(assertive) #assert functions
library(forcats)  #Manipulate factors

5.1.1.2 Loading the Dataset

sales <- read.csv("SalesVideogames.csv")
glimpse(sales)

5.1.2 The code used to clean the dataset

5.1.2.1 Creating Categories and Year

sales <- sales %>% 
  gather(key="category", value="year", 11:22) %>% 
  mutate(category = str_replace_all(category, "Platform.1", "Platform")) %>% 
  filter(!is.na(year))
sales$category <- as.factor(sales$category)
sales$year <- as.integer(sales$year)
head(sales)

5.1.2.2 Removing NA on sales

# Looking for NA
sum(is.na(sales$NA_Sales))
sum(is.na(sales$EU_Sales))
sum(is.na(sales$JP_Sales))
sum(is.na(sales$Other_Sales))
sum(is.na(sales$Global_Sales))

# Summary of the dataset
summary(sales$NA_Sales)
summary(sales$EU_Sales)
summary(sales$JP_Sales)
summary(sales$Other_Sales)

# Replacing NA for Mean
sales <- sales %>% 
  mutate(NA_Sales = ifelse(is.na(NA_Sales),0.2643,NA_Sales)) %>% 
  mutate(EU_Sales = ifelse(is.na(EU_Sales),0.1464,EU_Sales)) %>% 
  mutate(JP_Sales = ifelse(is.na(JP_Sales),0.07778,JP_Sales)) %>% 
  mutate(Other_Sales = ifelse(is.na(Other_Sales),0.04803,Other_Sales))

# Confirmation
sum(is.na(sales$NA_Sales))
sum(is.na(sales$EU_Sales))
sum(is.na(sales$JP_Sales))
sum(is.na(sales$Other_Sales))

5.1.2.3 Remove outliers

# Visualizing Outliers
sales %>% 
  ggplot(aes(x=X, y=year)) +
  geom_point()

# Changing values
sales$year <- as.character(sales$year)
sales <- sales %>% 
  mutate(year = str_replace_all(year,"1797","1997")) %>% 
  mutate(year = str_replace_all(year,"1894","1984")) %>% 
  mutate(year = str_replace_all(year,"2103","2013")) %>% 
  mutate(year = str_replace_all(year,"2102","2012")) %>% 
  mutate(year = str_replace_all(year,"2130","2013")) %>% 
  mutate(year = str_replace_all(year,"2101","2011")) %>% 
  mutate(year = str_replace_all(year,"2200","2009")) %>% 
  mutate(year = str_replace_all(year,"2105","2015")) 
sales$year <- as.integer(sales$year)

# Checking
sales %>% 
  ggplot(aes(x=X, y=year)) +
  geom_point()

5.1.2.4 Correcting categorical variables

sales <- sales %>%
  mutate(Name = str_trim(Name)) %>% 
  mutate(Name = str_to_lower(Name)) %>% 
  mutate(Platform = str_trim(Platform)) %>% 
  mutate(Platform = str_to_lower(Platform)) %>% 
  mutate(Publisher = str_trim(Publisher)) %>% 
  mutate(Publisher = str_to_lower(Publisher)) %>% 
  mutate(Platform = str_replace_all(Platform, "xbox", "xb"))

5.1.2.5 Creating new datasets for future analysis

sales2 <- sales %>% 
  gather(key="Region", value="Sales", 6:9) 
sales2

sales3 <- sales %>% 
  gather(key="Region", value="Sales", 6:10) 
sales3

5.1.3 Code use to creat the analysis

5.1.3.1 Average sales of video games in the last decades in North America, Europe, Japan and worldwide

graph1 <- sales3 %>% 
  group_by(year,Region) %>% 
  summarise(avg_sales=mean(Sales)) %>% 
  arrange(-avg_sales)
graph1

graph1_1 <- sales3 %>% 
  group_by(Region) %>% 
  summarise(avg_sales=mean(Sales)) %>% 
  arrange(-avg_sales)
graph1_1

ggplot(graph1, aes(x=year, y=avg_sales, color=Region)) +
  geom_line()+
  labs(x="Year", y="Average Sales", title="Average sales by year by region")+
  theme_minimal()

5.1.3.2 Year in which global video game sales were at its highest

graph2 <- sales %>% 
  select(year, NA_Sales,EU_Sales, JP_Sales, Other_Sales, Global_Sales) %>% 
  group_by(year) %>% 
  summarise(Global=sum(Global_Sales)) %>% 
  arrange(-Global)
graph2

ggplot(graph2, aes(x=year, y=Global))+
  geom_line()+
  geom_point()+
  ylim(c(0,750))+
  labs(x="Year", y="Global Sales", title="Global Sales by year")+
  theme_light()

5.1.3.3 Most successful genre of all time and for each region

graph3 <- sales2 %>% 
  group_by(year,Region,category) %>% 
  summarise(SalesxRegion = sum(Sales))
graph3

graph3_1 <- sales2 %>% 
  group_by(Region,category) %>% 
  summarise(SalesxRegion = sum(Sales))
graph3_1

graph3_2 <- sales2 %>% 
  group_by(Region,category) %>% 
  summarise(SalesxRegion = sum(Sales))%>% 
  top_n(1)
graph3_2

ggplot(graph3, aes(x=category, y=SalesxRegion, fill=category))+
  geom_bar(stat = "identity")+
  facet_wrap("Region", scales = "free_y")+
  theme(axis.text.x = element_blank(),
            axis.ticks.x = element_blank(),
            axis.text.y = element_blank(),
            axis.ticks.y = element_blank())+
  labs(x="", y="Sales", title="Sales by Genre")

ggplot(graph3, aes(x=year, y=SalesxRegion, color=category))+
  geom_line()+
  facet_wrap("Region", scales = "free_y")+
  theme(panel.background = element_rect(fill="white", color = "white"),
        panel.grid.major = element_line(size = 0.25, linetype = "solid",color = "lightgrey"),
        panel.grid.minor = element_line(size = 0.12,linetype = "solid", color="lightgrey"), 
        plot.background = element_rect(fill = "lightgreen"),
        axis.text.x = element_blank(),
          axis.ticks.x = element_blank(),
          axis.ticks.y = element_blank())

ggplot(graph3_1,aes(area=SalesxRegion, fill=Region, label=category, subgroup=Region))+
  geom_treemap()+
  geom_treemap_text(colour = "black",
                    place = "centre",
                    size = 15)+
  ggtitle("Most successful genre of all time for each region")

5.1.3.4 Game console that has released the majority of video games

sales %>% 
  count(Platform) %>% 
  arrange(-n)

ggplot(sales, aes(x=Platform)) +
  geom_bar(fill="lightblue") +
  labs(x="Console", y="#Videogames Released", title="Videogames released by platform")+
  theme_bw()

5.1.3.5 Genre with the most video games

ggplot(sales, aes(x=category, fill=category)) +
  geom_bar() +
  labs(x="Category", y="#Videogames Released", title="Videogames released by platform")+
  theme_classic()+
  theme(axis.text.x = element_blank(),
            axis.ticks.x = element_blank(),
            axis.text.y = element_blank(),
            axis.ticks.y = element_blank())

5.1.3.6 Publisher who has distributed the majority of video games

graph4 <- sales %>% 
  group_by(Publisher) %>% 
  count() %>%
  filter(n>400) %>% 
  arrange(-n)
graph4

ggplot(graph4, aes(x=Publisher, y=n, fill=Publisher))+
  geom_bar(stat = "identity")+
  labs(x="Publisher", y="#Videogames Released", title="Videogames released by publisher")+
  theme_linedraw()+
  theme(axis.text.x = element_blank(),
            axis.ticks.x = element_blank(),
            axis.text.y = element_blank(),
            axis.ticks.y = element_blank())

5.1.3.7 Most successful publisher by region and worldwide

graph5 <- sales3 %>% 
  group_by(Region,Publisher) %>% 
  summarise(SalesxRegion=sum(Sales)) %>% 
  top_n(1) %>% 
  arrange(-SalesxRegion)
graph5

graph5_1 <- sales3 %>% 
  group_by(Region,Publisher) %>% 
  summarise(SalesxRegion=sum(Sales)) 
graph5_1

ggplot(graph5_1,aes(area=SalesxRegion, fill=Region, label=Publisher, subgroup=Region))+
  geom_treemap()+
  geom_treemap_text(colour = "black",
                    place = "centre",
                    size = 15)+
  scale_fill_brewer(palette = "Greens")

5.1.3.8 Total Sales vs Total Videogames published

graph6 <- sales3 %>% 
  group_by(Region,Publisher) %>% 
  summarise(VGReleased = n(), SalesxRegion = sum(Sales)) %>% 
  mutate(SalesbyReleases = SalesxRegion/VGReleased) %>% 
  arrange(-SalesbyReleases)

graph6_1 <- sales3 %>% 
  group_by(Region,Publisher) %>% 
  summarise(VGReleased = n(), SalesxRegion = sum(Sales)) %>% 
  mutate(SalesbyReleases = SalesxRegion/VGReleased) %>% 
  top_n(1) %>% 
  arrange(-SalesbyReleases)
graph6_1

graph6_2 <- sales3 %>% 
  group_by(Region,Publisher) %>% 
  summarise(VGReleased = n(), SalesxRegion = sum(Sales)) %>% 
  mutate(SalesbyReleases = SalesxRegion/VGReleased) %>% 
  top_n(10) %>% 
  arrange(-SalesbyReleases)
graph6_2

ggplot(graph6,aes(area=SalesbyReleases, fill=Region, label=Publisher, subgroup=Region))+
  geom_treemap()+
  geom_treemap_text(colour = "black",
                    place = "centre",
                    size = 15)+
  ggtitle("Most successful genre of all time for each region")+
  theme(plot.background = element_rect(fill = "#222222",color = "#222222"),
        legend.background = element_rect(fill = "#222222",color = "#222222"),
        plot.title = element_text(color="white"),
        legend.title = element_text(color="white"),
        legend.text = element_text(color="white"))

ggplot(graph6_2,aes(area=SalesbyReleases, fill=Region, label=Publisher, subgroup=Region))+
  geom_treemap(show.legend = FALSE)+
  geom_treemap_text(colour = "black",
                    place = "centre",
                    size = 15)+
  ggtitle("Most successful genre of all time for each region")+
  theme(plot.background = element_rect(fill = "#222222",color = "#222222"),
        legend.background = element_rect(fill = "#222222",color = "#222222"),
        plot.title = element_text(color="white"))

5.2 Reference list